Random Multiclass Classification: Generalizing Random Forests to Random MNL and Random NB

نویسندگان

  • Anita Prinzie
  • Dirk Van den Poel
چکیده

Random Forests (RF) is a successful classifier exhibiting performance comparable to Adaboost, but is more robust. The exploitation of two sources of randomness, random inputs (bagging) and random features, make RF accurate classifiers in several domains. We hypothesize that methods other than classification or regression trees could also benefit from injecting randomness. This paper generalizes the RF framework to other multiclass classification algorithms like the well-established MultiNomial Logit (MNL) and Naive Bayes (NB). We propose Random MNL (RMNL) as a new bagged classifier combining a forest of MNLs estimated with randomly selected features. Analogously, we introduce Random Naive Bayes (RNB). We benchmark the predictive performance of RF, RMNL and RNB against state-ofthe-art SVM classifiers. RF, RMNL and RNB outperform SVM. Moreover, generalizing RF seems promising as reflected by the improved predictive performance of RMNL.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Random Forests for multiclass classification: Random MultiNomial Logit

Several supervised learning algorithms are suited to classify instances into a multiclass value space. MultiNomial Logit (MNL) is recognized as a robust classifier and is commonly applied within the CRM (Customer Relationship Management) domain. Unfortunately, to date, it is unable to handle huge feature spaces typical of CRM applications. Hence, the analyst is forced to immerse himself into fe...

متن کامل

Random Forests for Big Data

Big Data is one of the major challenges of statistical science and has numerous consequences from algorithmic and theoretical viewpoints. Big Data always involve massive data but they also often include data streams and data heterogeneity. Recently some statistical methods have been adapted to process Big Data, like linear regression models, clustering methods and bootstrapping schemes. Based o...

متن کامل

Mixing Weak Learners In Semantic Parsing

We apply a novel variant of Random Forests (Breiman, 2001) to the shallow semantic parsing problem and show extremely promising results. The final system has a semantic role classification accuracy of 88.3% using PropBank gold-standard parses. These results are better than all others published except those of the Support Vector Machine (SVM) approach implemented by Pradhan et al. (2003) and Ran...

متن کامل

One Class Splitting Criteria for Random Forests

Random Forests (RFs) are strong machine learning tools for classification and regression. However, they remain supervised algorithms, and no extension of RFs to the one-class setting has been proposed, except for techniques based on second-class sampling. This work fills this gap by proposing a natural methodology to extend standard splitting criteria to the one-class setting, structurally gene...

متن کامل

Ensembles of Binary SVM Decision Trees

Ensemble methods are able to improve the predictive performance of many base classifiers. In this paper, we consider two ensemble learning techniques, bagging and random forests, and apply them to Binary SVM Decision Tree (SVM-BDT). Binary SVM Decision Tree is a tree based architecture that utilizes support vector machines for solving multiclass problems. It takes advantage of both the efficien...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007